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ORTHOGONAL PATTERNS IN BINARY NEURAL NETWORKS 


Yoram Baram* 


Abstract 


A binary neural network that stores only mutually orthogonal patterns is shown 
to converge, when probed by any pattern, to a pattern in the memory space — the space 
spanned by the stored patterns. The latter are shown to be the only members of the 
memory space under a certain coding condition, which allows maximal storage of 
M = (2N)^’^ patterns, where N is the number of neurons. The stored patterns are 
shown to have basins of attraction of radius N/(2M), within which errors are cor- 
rected with probability 1 in a single update cycle. When the probe falls outside 
these regions, the error correction probability can still be increased to 1 by 
repeatedly running the network with the same probe. 


1 . Introduction 


A mathematical model for biological neural networks, consisting of Hebb's 
storage mechanism [1] and McCulloch-Pitts' retrieval mechanism [2], was shown by 
Hopfield [3] not only to exhibit the collective behavior of the network as an asso- 
ciative (content-addressable) memory, but also to be technologically realizable. 

The model consists of N variables x.p...,x^, corresponding to the neurons of the 
network, each capable of having one of two values ±1. The state of the network is 
defined as the vector x = (x^ , . . . ,X jj)^ where (*)^ denotes transpose. The infor- 
mation in M given patterns x^ 1 ^ , . . . ,x^ M ^ is stored in synaptic parameters, 
which are calculated according to the Hebbian rule [ 1 ] 


T 


J 



1 j 


(1.1a) 


Information retrieval is initiated by a probe (initial state) x (0). Neurons are 
then selected at random, one at a time, and their states are updated according to 
the McCulloch-Pitts rule [2] 
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(1.1b) 


x i (k + 1) = sgrv^ 



The network was shown by Hopfield [3] to be globally stable in the sense that, 
initialized by any probe, it will converge to some final state. He also observed by 
simulation that the stored patterns can be retrieved without severe error if M 
does not exceed 0.15 N, for N = 100. McElice et al. [4] and Bruce et al. [5] 
independently showed that the number of random patterns that can be retrieved witn 
finite probability cannot exceed N/(2 log N). Hopfield further observed by simula- 
tion that ’’for N = 100, a pair of random memories should be separated by 
50 ±5 Hamming units’’ [3], When the stored patterns differ by half the bits or 
neuron values, they are orthogonal in the Euclidean sense. The construction of 
orthogonal patterns requires preprocessing or encoding of information. Neural 
encoding mechanisms, which involve certain notions of pattern orthogonal ization, 
have been suggested by Kohonen [6] and by Grossberg [7]. Decoding coded patterns by 
neural networks has been suggested by Platt and Hopfield [8] for communication 
purposes and by Chiueh and Goodman [9] for pattern classification. 

In this paper we first observe that when the stored patterns are mutually 
orthogonal, they are equilibrium states of the network (1.1). Then we propose a 
slight modification of the model, which allows storage only of mutually orthogonal 
patterns. The network state, initialized by any probe, is shown to converge to a 
pattern in the space spanned by the stored patterns, which we call the memory 
space. There can be, at most, N orthogonal patterns. It is shown that when the 
stored patterns satisfy a certain coding condition, they are the only members of the 
memory space. The maximum number of such code words is shown to be (2N)^*^, which 
is in agreement with Hopfield' s empirical observation. A particular code construc- 
tion method is proposed. A network loaded with such code acts as a decoder. The 
stored patterns are shown to have basins of attraction of radius N/(2M). When 
initialized within this range of a stored pattern, the network state onverges with 
probability 1 to that pattern in less than a neural update cycle time. When the 
probe falls outside this range, the probability of retrieving the nearest stored 
pattern can still be increased to 1 by repeatedly running the network with the same 
probe. 


2. The Memory Space 

For a stored pattern to be retrievable, it is necessary that it is an equilib- 
rium point of the network, that, once reached is never left. From (1.1) we have 

Tx (l) = Yj x (i) [x (i) ] T x (a) 
i=1 


I 
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Let the Hamming distance between x^ and x^ be denoted by d[x v i 'x v *' ] and let 
= ( 1/N)d[x^ ] . It can be readily verified that 


(iUO 


[x (l) ] T x U) = N - 2d[x (l) ,x (S,) ] = N{ 1 - 2r[x (l) ,x (ll) ]> 


hence 


M 


Tx 


U) 


■ z 


N { 1 - 2r[x (l) ,x (<l) ]}x (l) 


i=0 


(O . z M/1 OwTv (i) „U)ix„< i) 


Nx v + 2^ N{1 - 2r [x ,x ] }x 
i*8, 


( 2 . 1 ) 


It can be seen that Tx^ has the same sign as unless the second term on the 

right-hand side of (2.1) offsets the first. Such offset cannot happen if 


r[x (l \x (j,) ] = ~ for all i * i 


( 2 . 2 ) 


This is a sufficient condition for the stored patterns to be equilibrium points of 
the network. It is not difficult to see that this condition is equivalent to the 
orthogonality condition 


x 


U) 



k * 2. 


A question of interest is, how can the orthogonality condition be maintained by the 
neural network. Denoting by T(n) the matrix of synaptic parameters corresponding 
to n stored patterns, let us modify the storage rule to be 


T(n + 1) 


s* ...... T ( n+ 1 ) 

T(n) + x (n+1) x (n+1) if T(n)x = 0 


T(n) otherwise 


(2.3a) 


and denoting 


z. 

l 


N 



J = 1 



x . 
J 
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let us modify the neuron update rule to be 




f+1 if 

Z. 

1 

> 0 


x.J-1 if 

z i 

< 0 

(2.3b) 

l x i if 

z . 
1 

= 0 



These slight modifications of (1.1a) and (1.1b) have the physical interpretation 
that the synaptic parameters remain unchanged by the probe if there is an energy 
release by neural firing activity. This can only occur according to (2.3b) if the 
probe is not orthogonal to some of the previously stored patterns. If the probe is 
orthogonal to all the stored patterns, that is, when it is in the null space of T, 
energy release by firing cannot take place and, instead, relief of potential energy 
is provided by a change in the synapses, meaning that the probe is stored as a new 
pattern. Near orthogonality may be represented by the condition | z ^ | < e for a 
small integer e. In the rest of the paper we assume strict orthogonality of the 
stored patterns for mathematical simplicity. We next show that complete energy 
release means convergence into the space spanned by the stored patterns, which, 
according to the mechanism (2.3b), cannot occur if the probe is orthogonal to these 
patterns. 

The state space of the network is the collection of all vectors of dimension 
N, whose components have values ±1. We define the network's memory space, denoted 
by X, as the subspace of the state space, which is spanned by the stored patterns, 
that is, the vectors in the state space that can be obtained as linear combinations 
of the stored patterns. The orthogonal projection of an arbitrary pattern x on a 
stored pattern x^ is given by 


x T x (l) 


h = 


li 


(») 


,K (1> 

d — N “ “ — 


where | |x| | 


(x T x)°- 


denotes the Euclidean norm of x. It can be seen that 


Tx 


. £ ." V 1 ' . 

1=1 



Nx 


where x is the projection of x on X and the last equality follows from the 
fact that the stored patterns x T , 1 = 1,...,M are mutually orthogonal. 
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The energy function 


E(x) = -xxx 


can be seen to have the value 


E[x (S,) ] = 


l 



for each of the stored patterns. For an arbitrary pattern x, it has the value 

E(x) = -Nx^x 


By the Cauchy-Schwartz inequality 


H * 1 1*1 1'l I’Ll I 


Suppose that x does not belong to X, then 


Mill < llxll 


yielding, since | |x| | = N 


0.5 


T" 
x x 


< N 


hence, 

E(x) > -N 2 

It follows that for a state x that does not belong to X, the energy is not mini- 
mal. On the other hand, if x belongs to X (not necessarily x = x'^), then 


x 


x 


yielding 


E(x) = -Nx T x = -N 2 
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p 

We have shown that the minimal value of the energy function is -N . All points in 
the memory space X have this minimal energy value, while points outside the memory 
space have higher energy. It was shown by Hopfield [3] that the energy decreases 
along any path in the state space of his network. McEliece et al. [4] elaborated on 
Hopfield 's analysis and showed that the energy can remain unchanged for only a 
finite number of steps. This implies that the Hopfield network converges to a point 
of minimal energy. Since the algorithm (2.3b) differs from the McCulloch-Pitts 
algorithm only by the transformation of points for which, initially, Tx = 0 into 
new stored patterns having minimum energy, it converges, as the Hopfield model, to 
minimum energy points. It follows that for orthogonal stored patterns the network 
will converge to a point in the memory space. 


3. Perfect Storage 

We next show that under a certain condition on the stored patterns the memory 
space contains only these patterns. This situation may be characterized as "perfect 
storage." The stored patterns can then define a set of code words or, simply, a 
code that can be used for information representation. Suppose that the scalars 
CpC 2 ,...,c M satisfy the equation 


( 1 ) ( 2 ) 

C^X + C 2 X + 


+ C M— 


(M) 


= x 


(3.D 


where x^ ^ , ,x^^ are the stored patterns and x is a permissible state of the 

network, that is, a vector whose components have values ±1. Then, defining the 
matrix 


A = 


[x 


(D:..(2) 



] 


and the vector 


c = [c r c 2 



T 


we have 


yielding 


But since 


Ac = x 


T.T. 

c A Ac = 


T 

xx = N 


(3.2) 


(3.3) 


A T A b NI m 
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where I M is the MxM identity matrix, it follows from (3-3) that 


T 2 2 2 , u\ 

cc=c 1 +c 2 +...+c M =1 (3.4) 

Denoting by the jth element of x^, the square of the jth row of (3-D can 

be written as 


I 

i,k 


(i) 00 

Xj Xj c.c, = 1 


or 


y o 2 * y xOx^io.o, = 

L-J 1 Lj j j Ik 


i*k 


which, by (3.4) yields 


Z 

i*k 


x'l'x^c.e. = 0 
j J ik 


(3.5) 


The memory space will contain only the stored patterns if the only solution to (3.5) 
is 


= 0 for i,k = 1,...,M , i * k 

which means that, at most, one of the coefficients c^ is non-zero, 
scalars 

v (i,k) _ ( i) (k) 

y j ' X J X j 


the vectors 


and the matrix 


( i,k) ( i,k) ( i , k) (i,k) T 

y ■ ly, y 2 ••• y„ ] 


(3.6) 

Defining the 

(3.7) 

(3.8) 


v r 0,2). (1,3)* • ( 1 ,M) • (2,3)- (2,4)- • (M-1 ,M) 

Y = [y :y : • • • :y :y :y : * - - :y ] (3.9) 
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and also denoting 


c = [0,02 o,o 3 ••• 0,0 H c 2 0 3 o 2 o„ o h _,o m J T (3.10) 

equation (3.5) can be written for all j,i,k as 

Yc = 0 (3.11) 

This equation has only the zero solution (3.6) if and only if Y has full column 
rank. The number of columns in Y is given by 

M - 1 + M - 2 + ... + 1 = M(M - 1)/2 

It follows that the memory space will contain only the stored patterns if the condi- 
tion 


col. rank Y = M(M - 1)/2 (3. 12) 

is satisfied. Since the number of rows of Y is N, a necessary condition for Y 
to have full column rank is 

M(M - 1) < 2N (3.13) 

For large M, condition (3.13) may be written as 

M < (2N)°‘ 5 (3.14) 

We note that for N = 100, the latter condition takes the value M < 14, which is in 
agreement with the capacity bound obtained empirically by Hopfield [3]. Provided 
that the latter condition is satisfied, condition (3.12) is not guaranteed to hold 
for every choice of orthogonal patterns, as illustrated by the following examples. 
Consider first the orthogonal patterns formed by the columns of the matrix (where 
the symbol 1 has been omitted) 


r+ + +1 


H = 


+ 


which defines a Hadamard code ^ (see, e.g., [10], pg. 44). It can be readily 
verified that if the stored patterns are the first two columns of H, they are the 
only patterns in the memory space. Indeed, if 

c 1 x 1 + cx 2 = x 
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i 



where x is permissible, then 


and 


yielding 


(c 1 + c 2 ) = 1 


(c 1 - c 2 ) = 1 


° 1 C 2 5 0 


2 2 

Hence, either c-| = 0 or C 2 = 0 and since c^ + c 2 = 1, the assertion follows. 
Similarly, if all three columns of H are stored, they are the only patterns in the 
memory space, as the product matrix Y, obtained as 




Y = 


+ 


can be seen to have mutually orthogonal columns. 

Next, suppose that the stored patterns are the columns of the matrix 


r::: 



+ - + 

+ + 


Which are the first four columns of the Hadamard matrix obtained by the Sylvester 
matrix construction (see [10], p. 45). It can be seen that the product matrix 
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r+ + 


+ + + 


+ 


+ 


Y = 


+ + 


+ + + + + + 


Lr - + 


does not have full column rank (the third and the fourth columns are the same and so 
are the second and the fifth and the first and the sixth). On the other hand, 
replacing the last columns in H by another, as in 


H = 


‘+ + + + 

+ - + + 

+ + - + 

+ - - + 

+ + + - 

+ - + 

+ + 


(3.15) 


yields the product matrix 


Y = 


»+ + + + + +1 

- + + -- + 

+ - + - + - 

+ + - + -- 

+ ---- + 


which can be seen to have mutually orthogonal columns. 

The above example shows that the satisfaction of condition (3*12) is not guar- 
anteed by orthogonality of the code words but, rather, depends on the particular 
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choice of the code. The question arises whether there is a systematic code con- 
struction that guarantees satisfaction of the condition. We next suggest such a 
construction. Let N = 2^ for some positive integer k and suppose, without loss 
of generality, that the first code word is (+ + . . . +) T . Divide the first word 
into two equal sections and change the signs of the bits in one section. Divide 
each of these sections again into two equal sections and change the signs in one 
This operation can be repeated only k times and results in k + 1 mutually 
orthogonal code words. Since the bit-wise products of the resulting code words 
maintain the original division modulo sign change, the resulting product words are 
mutually orthogonal. Hence, the desired condition is satisfied. Such code con- 
struction for N = 8 yields 


r+ + + -n 


+ + + - 

+ + - + 


H 


+ + 


+ - + + 


+ - - + 


which can be seen to include the same code words as (3.15). The latter was shown 
above to satisfy the condition. 

4. Error Correction 


We have seen that under condition (3.12) the memory space contains only the 
stored patterns. Consequently, the network, probed by any pattern, will converge to 
one of the stored patterns. If the latter are viewed as code words, the probe may 
be viewed as such a word corrupted by noise. Convergence to the code word closest 
to the probe in Hamming distance then means error correction. We next show that the 
network has the capability of correcting a certain number of bits, with probabil- 
ity 1. This number defines the "basins of attraction" of the stored patterns. 
Suppose that is the stored pattern closest to the probe x. We have 
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M 

Tx = ^ {N - 2d[x (J ^,x]}x (J) 

j = 1 

= {N - 2d[x (S,) ,x]}x (<l) + Yj {N “ 2d[x (j) ,x]}x (J) (4.1) 


Let us use the abbreviated notation d = d[x^,x]. Suppose that the i'th neuron 
is to be updated. The distance d will decrease or remain the same if and pnly if 


the sign of [Tx^ is the same as that of [x^]j. In the worst case all [x^] it 
have the same sign, opposite to that of [x'^l-p and 


J * * 


d[x (J) ,xl = | N - d 


for all j * l 


which yields the maximum offset of the first term on the right-hand side of (4.1) by 
the second. In this situation, the sign of [Tx]^ will be the same as that of 
[*<*>]i if and only if 


(N - 2d) > (M - 1 ) [N - 2(| N - d)] 


or, 



(4.2) 


When a code word is corrupted so that, at most, N/(2M) of its bits are erroneous, it 
will be corrected with probability 1 by the network in a single neural update 
cycle. The neighborhood within a distance N/(2M) about a stored pattern is its 
"basin of attraction." When the probe falls outside this range, the network's state 
may still converge with high probability to the closest stored pattern, depending on 
its distance from the probe. 

Suppose that the network is rerun repeatedly with the same probe. The final 
state of the first run is registered. The final state of each consecutive run 
replaces that of the previous one if it is closer to the probe and is discarded 
otherwise. Since the neuron selections for update are mutually independent, so are 
the resulting state trajectories in the different network runs. Each run terminates 
in one of the stored patterns. Since the probability of converging to the pattern 
closest to the probe is at least as large as that of converging to any other stored 
pattern, it is finite. It follows that the probability of recovering the stored 
pattern closest to the probe increases to 1 as the number of runs is increased to 
infinity. 
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5. Conclusion 


The storage of mutually orthogonal patterns in a binary neural network guaran 
tees convergence of the network state, initialized by any pattern, to a pattern in 
the memory space. Under a certain coding condition, the memory space contains oniy 
the stored patterns. The state converges to the nearest stored pattern with proba- 
bility 1 when it is initialized within the latter's basin of attraction. Otherwise, 
the probability of error correction can be increased asymptotically to 1 by 
repeatedly running the network with the same probe. 
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